Skip to content

Conversation

@qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Nov 13, 2025

Description

This PR includes changes:

  1. Implement the step 1&2(Replace field and literal with parameters) described in the RFC:[RFC] RexNode standardization for script push down #4757. This will enhance our script cache to get higher hitting ratio.
  2. Remove ROW_TYPE and EXPR_MAP in our script. Then the average script size can be reduced by 2 to 5 times than before.
  3. Remove OpenSearchRequestBuilder when computing digest for OpenSearchIndexScanOperator, while keep it when generating explain plan.
  4. Remove OpenSearchRequestBuilder in PushDownContext and make the related action lazy perform. Since we have change 3, it's less valuable to hold that object in each PushDownContext.

Related Issues

Partly resolves #4757

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
…ardization

# Conflicts:
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
#	integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/explain_regexp_match_in_where.json
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Comment on lines +207 to +211
return switch (sources.get(index)) {
case DOC_VALUE -> getFromDocValue((String) digests.get(index));
case SOURCE -> getFromSource((String) digests.get(index));
case LITERAL -> getFromLiteral((Integer) digests.get(index));
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could u add a developer doc to explain the spec of pushdown script, after encoding, is not easy to read I guess.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 1278054

Copy link
Collaborator

@penghuo penghuo Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It is very clear.

Nit: Is Literals array necessary? DIGESTS and LITERALS can combined?

        "params": {
            "utcTimestamp": 17630261838681530000,
            "SOURCES": [0, 2, 2, 1],
            "DIGESTS": ["age", 0, 1, "email"],
            "LITERALS": [35, "u35"]
        }

vs

        "params": {
            "utcTimestamp": 17630261838681530000,
            "SOURCES": [0, 2, 2, 1],
            "DIGESTS": ["age", 35, "u35", "email"]
        }

Signed-off-by: Heng Qian <[email protected]>
…ardization

# Conflicts:
#	integ-test/src/test/resources/expectedOutput/calcite/explain_eval_min.yaml
#	integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_eval_min.yaml
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
@yuancu yuancu requested a review from penghuo November 14, 2025 07:21
Signed-off-by: Heng Qian <[email protected]>
SerializationWrapper.wrapWithLangType(
ScriptEngineType.CALCITE, serializer.serialize(rexNode, rowType, fieldTypes));
ScriptEngineType.CALCITE,
serializer.serialize(rexNode, rowType, fieldTypes, sources, digests, literals));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why include sources, digests, literals as paramater in serialize() function and client create empty array?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It is required when create Script on L1504.
I also found sources, digests, literals exposed been used in multiple place without encapsulation. e.g. ScriptDataContext and standardizeRexNodeExpression.

can we encapsulate our script protocol in a class? e.g.

class ParameterBindings {
   void putValue(String name, Object value)
   Object getValue(String name)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] RexNode standardization for script push down

2 participants